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Breast cancer is one of the major causes of death among women, and early 
detection may decrease the aggressiveness of the disease. The goal of this 
paper is to create an automated system that can classify digital mammogram 
images into benign and malignant. This paper presents a new detection 
technique of micro-calcifications in mammogram images. An automated 
technique for identifying breast microcalcifications (MCs) proposed utilizing 
two-level segmentation processes, first crop the breast area from the image 
using k-means clustering, then, an optimized region growing (ORG) 
approach has been used, where multi-seed points and thresholds are 
generated optimally depending on the color values of the image pixels. Then 
the texture features are extracted based on Haralick definitions of texture 
analysis. In addition, three features (cross-correlation coefficient, pearson 
correlation, and average area of segmented spots) are obtained from the 
segmented image. Support vector machine (SVM) classifier evaluate the 
efficiency of the system utilizing the images from the digital database for 
screening mammography (DDSM) dataset. The results were obtained by 
utilizing 355 images for training and 85 images for testing. The proposed 
system's sensitivity reached up to 97.05%, the specificity obtained is 
98.52%, and accuracy is 98.2%. 
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1. INTRODUCTION 


One of the leading causes of female deaths worldwide is breast cancer. It has caused more deaths 
than any other illness, such as malaria or tuberculosis. The cancer research agencies of the world health 
organization (WHO) (i.e. the international agency for cancer research (IARC) and the American cancer 
society) announce that 17.1 million new cancer cases were registered worldwide in 2018 [1]. It is anticipated 
that an estimated 276,480 new cases of invasive breast cancer will be diagnosed in women in the United 
States by 2020, along with 48,530 new cases of non-invasive breast cancer [2]. 

In developed and emerging countries, citizens shift their lifestyle from traditional to modern, 
increasing the incidence of breast cancers among women mainly between the ages of 35-55. It is possible to 
monitor the incidence of breast cancers by detecting breast cancers in their early stages [3]. Self and clinical 
breast checks, magnetic resonance imaging (MRI), ultrasound, and mammography [4] are screening 
techniques used for breast cancer screening. The image produced through mammography is called a 
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mammogram, which consists of the background, breast region, fat tissue, breast masses, and 
microcalcifications with high intensities [5]. As the demand for mammography processing is growing, 
radiologists can produce errors that overlook important clues due to fatigue [6]. 

Microcalcifications (MCs) are deposits of calcium in a mammogram that appear as tiny bright spots. 
Due to MCs and Masses are looking close to the context in the mammogram, the identification and 
classification are difficult. Since image processing methods play a significant role in the earlier diagnosis of 
MCs. Researchers have developed many strategies to determine the precise position of MCS and Masses [6]. 

This paper introduces a useful pixel-based technique for region growth. Due to the nature of the 
calcifications which can be sporadic, multi-points have been utilized to detect calcifications in more than one 
region of the breast, which cannot be determined by using the standard region growth algorithm, on the other 
hand, need an initial seed point which increases computing cost and execution time. 

The structure of this paper is as shown in: literature review presented in section 2. Section 3 
provides a detailed overview of the proposed methodology for data collection, pre-processing, denoising, 
segmentation, extraction of features, and classification. The performance analysis that demonstrates the 
segmentation and classification efficiency of the suggested system is given in section 4. The conclusion of 
this work is obtained in section 5. 


2. LITERATURE REVIEW 

According to Rouhi et al. [7], two approaches for breast mass segmentation were proposed using the 
growing region. The texture features were collected and utilized a neural classifier in order to differentiate 
malignant and benign mammograms. 96.87%, 95.94%, and 96.47% are the acquired rates of sensitivity, 
accuracy, and precision respectively. Sambandam and Jayaraman [8] suggested a self-adaptive segmentation 
technique for multilevel thresholding based on dragonfly optimization, where optimum thresholds are created 
utilizing a swarm optimization approach. 

Alam et al. [9] proposed method that uses the first wavelet-based algorithm to boost the region of 
interest, followed by morphological operations and interpolation methods of image segmentation. Then sub- 
regions were obtained by splitting the initial image and bicubic interpolation was used to obtain the intensity 
level of the local history. Finally, the difference image is obtained by subtracting an interpolated image from 
the original image, and area-ranking technique is used to cluster microcalcifications. Liu and Zeng [10] 
suggested an optimized region-growing methodology called multiple concentric layers (MCL) in order to 
increase accuracy and achieved a sensitivity of 82.4% when tested on a collection of 164 mammograms. 

Anitha and Peter [11] suggested a method that are eliminated the noise and objects using Weiner 
filtering, followed by background separation by morphological operations, global thresholding, kernel-based 
level set, and fuzzy clustering to segment the mass after the region of interest is defined. Ibeni et al. [12] 
provides a full Bayesian method to evaluating the predictive distribution of all classes using three classifiers: 
Naive bayes (NB), bayesian networks (BN), and tree augmented Naive Bayes (TAN) with three datasets: 
breast cancer, breast cancer Wisconsin, and breast tissue dataset. After that, the prediction accuracy of 
Bayesian methods is compared to three common machine-learning algorithms: K-nearest neighbor (K-NN), 
support vector machine (SVM), and decision tree (DT). The findings indicated that the Bayesian networks 
(BN) algorithm had the best performance, with an accuracy of 97.281%. 

According to Touil et al. [13] suggested a new conditional region growth (CRG) method for 
determining correct MC bounds beginning from a set of seed points. Regional maxima detection and 
superpixel analysis are used to find the beginning seed points. In terms of contrast and shape variation, the 
region growing step is governed by a set of criteria adapted to MC detection. These criteria are separated into 
two groups and are generated from prior knowledge to characterize MCs. The first one is about the size of the 
search neighborhood. The second one is about analyzing gradient information and shape evolution during the 
growing process. 


3. PROPOSED METHOD 

The methodology proposed involves multiple steps including image processing techniques. The first 
step is the image acquisition from the dataset collected in the digital database for screening mammography 
(DDSM) where regular and irregular mammograms are collected. The optical mammograms are then pre- 
processed by Gaussian filters for noise reduction. The images are further processed using the proposed multi- 
points (seeds) region growing method to extract the region of interest (ROI), which targets breast MCs. 
For feature extraction, the ROIs are then processed where a collection of texture features are extracted using 
Haralick texture characteristics. Then the extracted textures are further fed into the SVM classifier. The 
diagrammatic illustration of the suggested procedure is shown in Figure 1. 
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Figure 1. The workflow of the proposed algorithm 


3.1. Data acquisition 

The curated breast imaging subset of DDSM (CBIS-DDSM) dataset are available at [14], which are 
the digital images used in the proposed technique. In this work, a series of 440 digital mammograms of breast 
microcalcification containing benign and malignant severity was used in cranio-caudal (CC) views for 
training and testing purposes. This collection comprises 329 benign cases, 111 malignancies cases in both left 
and right breast. 


3.2. Pre-processing 

In any image processing technique, pre-processing is regarded as the fundamental step. The ultimate 
objective of this technique is to improve the image quality and the image characteristics that are necessary for 
further processing. Mammogram images are difficult to view compared with other medical images, so pre- 
processing is important [15]. The proposed method utilizes a Gaussian filter to preprocess the digital 
mammograms where the noises are removed and the images softened out. Figure 2(a) is the original image 
and Figure 2(b) shows the effect of applying a Gaussian filter to the image. 


3.3. Pre-segmentation (denoising) 

Usually, medical images contain some symbols, words, or letters that show the type or some of the 
medical-physical characteristics of the image. This is generally considered image noise and may affect 
classification accuracy. To overcome these problems the denoising is used. This process goes through three 
stages (K-means clustering, eliminate noise objects, recover the important area): 


3.3.1. K-means clustering 


K-means clustering is a method of grouping or partitioning a pattern into multiple clusters such that 
similar patterns are allocated to the same cluster. Clustering is used in many forms of analysis to blot out the 
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field of image segmentation. The K-means clustering is an unsupervised algorithm, and it is one of the most 
widely used techniques [16]. At this stage, the image is divided into a group of converged areas in color 
intensity, and as a result, the background of the image will be isolated from its components more clearly. 
Also, the noise in the images will be isolated from the rest of the image components, which facilitates the 
process of cutting them later. Figure 2(c) illustrates the process of applying K-means clustering. 


3.3.2. Eliminate noise objects 

Morphological operation is rearranging the order of pixel values. It operates on structuring elements 
and input images, structuring elements are attributes that probe features of interest. Erosion is an essential 
operation used here, during erosion, the rock bottom value is chosen by comparing all the pixel values in the 
region of input image [17]. After dividing the image in the previous step into specific areas of converging 
intensity in color and isolating them from the background, the extra foreign objects that cause noise in the 
image are cut off. Thus, we have an image that contains only the breast area without any noise as shown in 
Figure 2(d), this area will be used to restore the breast area from the original image, which we call the breast 
area mask. 


3.3.3. Recover the important area 

It is the last step in this stage, where the mask that was produced in the previous step is relied on and 
applied to the original image to restore the equivalent area of the mask, and neglect the components of the 
image and consider it as background for the area resulting from the retrieval process. Figure 2(e) 
demonstrates the area of the resected breast. 


(a) (b) (c) (d) (e) 


Figure 2. The pre-segmentation steps; (a) original image, (b) Gaussian filter, (c) applying k-means, 
(d) erosion filter, and (e) breast area retrieval 


3.4. Segmentation 

The method of segmentation is distinguishing the benign and malignant area by splitting the digital 
mammograms through non-overlapping segments from the background portions [2]. The region-based 
strategies find a seed point and growing regions until a criterion of homogeneity is reached [18]. This paper 
presents an effective variant of the region growing pixel-based technique that produces optimal seeds and 
thresholds. Due to the nature of the calcifications, which may be sporadic, multi-points (seeds) have been 
used to determine the calcifications if they are in more than one area of the breast, which cannot be 
determined in the case of using the traditional region growth algorithm. Figure 3(a) shows the breast area 
retrieval, Figure 3(b) is the segmented area, and Figure 3(c) shows the ROI. 


(a) (b) 


Figure 3. The segmentation process, (a) breast area retrieval, (b) segmented area, and (c) ROI 


The following presents the main steps for generating optimal seeds using multi-points region- 
growing segmentation: 
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a. Input digital mammogram image. 

b. Let mammogram values is a list of mammogram pixel values (It will contain all the pixel values in the 
image without duplicating). 

c. Sort mammogram Values in descending order. 

d. Determine the segmentation threshold, which will determine the approved values from the list and which 
will be adopted in the segmentation process. This is done by dividing the sorted Mammogram Values 
into ten sections and adopting the first section of it (higher values), which will represent the list of 
segmentation values or seed points. 

e. For each pixel in the Mammogram image, if the pixel value belongs to the Segmentation Values, then 
the pixel will be dimming, otherwise, the pixel will be discarded. 

The algorithm for the multi-points region-growing method is presented in Algorithm 1: 


Algorithm 1: Optimized Region Growing 

Input: Digital Mammogram Image (M). 

Output: Segmented Image (S). 

Begin 
W = Digital Mammogram Image width 
H = Digital Mammogram Image height 
MammValues<>: list of Mammogram pixel values 
SegmentationValues<>: list of segmentation threshold values 
Aggregation (M) 
For i & 1 to W do 

For j e 1 to H do 


If (! MammValues.Contains (Mi,})) 
MammValues .Add (Mi,5)) 
End 
End 


End 
Sort (MammValues) 
Determine thresholds (MammValues) 


For i & 1 to (MammValues.Count / 10) do 
SegmentationValues.Add (MammValues [i] ) 
End 
Segmentation (M, SegmentationValues) 
For i * 1 to W do 
For j = 1 to H do 
For t & 1 to SegmentationValues.Count do 


If (GetPixelValue (Ma,j)) == SegmentationValues[t] ) 
S.SetPixel(i, j, Color.White) 
Else 
S.SetPixel(i, j, Color.Black) 
End 
End 
End 
End 
Return S 


End 


3.5. Features extraction 
Features of the image show the current attributes and characteristics. The extracted features utilized 
for classification should also be identifiable, effective, and autonomous [19], [20]. In the first step of features 
extraction, statistical textural analysis-features (cross-correlation coefficient and pearson correlation) 
information from the comparison of the original image with the segmented image intensities extracted. 
—  Cross-correlation coefficient: It is a measure of similarity of two series as a function of the displacement 
of one relative to the other. The cross-correlation coefficients are more robust to changes of illumination 
than the mean square error (MSE) [1]. 


: +s kæk- X)Ve-Y) 
Cross-correlation coefficient==—=—————_— 1 
VÈk(xk- x)? a) 
— Pearson correlation coefficient: evaluates if there is statistical support for a linear relationship, 
represented by a population correlation coefficient, between the same pairs of variables in the 
population. A parametric calculation is the pearson correlation [21]. 
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Pearson correlation coefficient= >E 20V (2) 


Ze- Zy-y? 


Then, the average area of segmented spots was obtained. In this extraction process, the average area 
of segmented spots is calculated which represents the average of the infected areas. Suppose that 
B={B1, B2, ..., BN} is the set of segmented blobs, where N is the number of the blob segmented from the 
mammograms. 


EN; Area(Bi) 


Average Area= N 


(3) 

In the second step of features extraction, the proposed technique utilizes a collection of texture 
features based on Haralick's texture analysis concepts [22]. Where twenty-six texture features (angular 
second moment, contrast, correlation, variance, inverse difference moment, sum average, sum variance, sum 
entropy, entropy, difference variance, difference entropy, first information measure, second information 
measure, and invariance was achieved for each of these statistics by averaging them over the four directional 
co-occurrence matrices) are defined by the proposed methodology. 

Thus, for training the SVM classifier, a collection of twenty-nine features was extracted (three 
features from the first step and twenty-six features from the second step). In this paragraph, a sample is 
shown of the obtained results. It displays a comparison of some extracted features of the segmented images. 
Figures 4(a), (b), (c), and (d) shows the distance between features values of malignant and benign samples 
(angular second moment, variance, first information measure, and cross-correlation coefficient) respectively 
for a group of 30 segmented samples, and illustrated the extracted features that can be used to distinguish and 
observed the values of the malignant tissue differed from the values of the benign tissue. 
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Figure 4. Distance between features values of malignant and benign samples; (a) distance between (angular 

second moment) values of malignant and benign samples, (b) distance between (variance) feature values of 

malignant and benign samples, (c) distance between (first information measure) feature values of malignant 

and benign samples, and (d) distance between (cross-correlation coefficient) feature values of malignant and 
benign samples 


3.6. Support vector machine classifier 
SVM is one of the best classifiers and most used as a classification algorithm. The working principle 
support vector machine is mainly based on marginal calculations [23], [24]. In this paper, the Gaussian kernel 
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function is used for transformation. Nonlinear samples are converted into a high-dimensional feature space 
by using the kernel function, where it may be possible to isolate nonlinear samples or data to make the 
classification convenient [25]. 


4. RESULTS AND DISCUSSION 

For the calculation of accuracy, sensitivity, and specificity, the confusion matrix distinguishes the 
terms true positive (TP), true negative (TN), false positive (FP), and false negative (FN) from the predicted 
and ground truth result. Accuracy, sensitivity, specificity are calculated utilizing as shown in (4)-(5), measure 
the efficiency of the proposed method outlined in the paper. 440 images are used to test the classification's 
efficiency by using the (ORG) in the segmentation stage, where 329 benign cases and 111 malignant cases in 
CC view are used. The images were selected randomly from the CBIS-DDSM dataset are divided in advance 
into training and testing. For training, 355 images are used and 85 images are utilized to assess the proposed 
method. The quality of classification can be determined as [26]: 
— Sensitivity: is a test that decides the chances of outcomes that are correctly identified when the cancer is 

present. 


oe T 
Sensitivity = a (4) 


— Specificity: is a test that determines the probability of the outcomes that are true negative which are 
correctly identified. 


Specificity = = (5) 


TN+FP 


— Accuracy: is a test that determines the probability that how many samples are correctly identified. 


TP+TN (6) 


Accuracy = ——__—_ 
Y = TP+TN+FP+FN 


In the classification process, these evaluations are expressed in terms of various parameters [26], the 

TP, TN, FP, and FN. 
— TP: positive instance classified as positive. 
— TN: negative instance classified as negative. 
— FP: negative instance classified as positive. 
— FN: positive instance classified as negative. 

The proposed method was validated using SVM for desired results. Concretely, the proposed 
method achieved significant accuracy in classifying the CBIS-DDSM dataset. It achieved good results with 
the proposed ORG and SVM by showing 97.05% sensitivity, 98.52% specificity, and 98.2% accuracy in 
identifying both the benign and malignant samples. The ROC curve area of the proposed method was 
validated, as shown in Figure 5. 
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Figure 5. ROC curve of the classification results 
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The proposed system was implemented by using Visual Studio.Net framework 2017-C# developer, 
which was implemented on the Windows 10-64 bit OS, Core i7 processor, and 8GB RAM. The proposed 
system processes the images of the left and right breast in Cranio-Caudal (CC) views obtained from the 
DDSM dataset. The proposed system first utilized Gaussian filtering to pre-process the mammogram images, 
then applying the denoising step that consists of isolating the image background from its components using 
K-means clustering, eliminating noise objects, and breast area recovery. Then the Optimized Region 
Growing used to segment ROI includes the MCs. The work deal with the extraction of segmented area 
features to detect and distinguish medical digital mammogram image of benign and malignant. The 
experimental results of the proposed algorithm are contrasted with prior research as shown in Table 1. The 
obtained results, relative to the (DDSM) database, have shown that the proposed method is more reliable than 
other reported literature approaches and lead to the conclusion that makes it possible for clinical experts to 
decide and diagnose. 


Table 1. Comparison of the existing technique with the proposed system 


References Segmentation Features methods Classifier Dataset Sensitivity Specificity | Accuracy 
Rouhi Region growing GLCM, contour- Random forest, MIAS and - - 96.47% 
et al. [7] optimized using related, and NB, SVM, and DDSM 

GA adaptive morphological KNN 

threshold features 

method 
Setiawan et Cropping GLCM laws texture FFNN MIAS - - 93.90% 
al. [27] features 
Kashyap et Thresholding Shape features and SVM Mini-MIAS - - 96.92% 
al. [28] technique and Tamura using (RBFK) 

Fuzzy C-means 
Patel and Region growing Shape and texture Multilayer DDSM - - 95.6% 
Sinha [29] perceptron 

neural network 

Shen - pixel-level CNN DDSM 86.7% 96.1% - 
et al. [30] annotations 
Varela Adaptive Gray level and _ Backpropagation Images from 88 % - - 
et al. [31] threshold contour-related neural network hospitals in 

method features Santiago de 


Compostela's 
health district, 


Spain. 

Xie et. al. level set model gray-level features ELM andSVM Mini - - 96.02% 
[32] and textural features MIAS+DDSM 
Punitha Dragonfly GLCM and GLRLM FFNN using DDSM 98.1% 97.8% 98% 
et al. [3] region growing texture features backpropagation 

optimization 
Proposed Multi-seed Cross-correlation SVM DDSM 97.05 % 98.52 % 98.2 % 
system points coefficient, pearson 

optimized correlation, the 


region growing average area of 
segmented spots and 


texture features 
based on Haralick 
definitions 


5. CONCLUSION 

In this work, we presented a new optimized region growing method where multi-points (seeds) have 
been used to detect and segment microcalcifications (MCs) of mammographic images accurately. To detect 
the benign and malignant, a collection of twenty-nine features was extracted for training the SVM classifier. 
The precise detection of the proposed multi-points (seeds) region growing method with the SVM reached an 
accuracy of 98.2%. The proposed method has proven to provide a nearly accurate diagnosis of benign and 
malignant microcalcifications and can be used as a reference for radiologists and as a second opinion in 
crucial cases. Further focus should be placed on the methods of feature collection and extraction in future 
work. In addition, evolutionary algorithms can generate the optimized threshold generated in the growing 
region technology. 
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